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Description 

The present invention relates in general to data 
processing systems and in particular to the distribution 
of information over from a network such as the Internet 
to a large number of data processing systems. 

The Internet has become a cultural fixture as a 
source of both information and entertainment. Many 
businesses are creating Web sites as an integral part of 
their marketing efforts, informingconsumers of the prod- 
ucts or services offered by the business or providing oth- 
er information seeking to engender brand loyalty. Many 
federal, state, and local government agencies are also 
employing Internet sites for informational purposes, par- 
ticularly agencies which must interact with virtually all 
segments of society such as the Internal Revenue Serv- 
ice and secretaries of state. Operating costs may be re- 
duced by providing informational guides and/or search- 
able databases of public records online. 

Currently, the most commonly employed method of 
transferring data over the Internet is to employ the World 
Wide Web environment, also called simply "the Web". 
Other Internet resources exist for transferring informa- 
tion, such as File Transfer Protocol (FTP) and Gopher, 
but have not achieved the popularity of the Web. In the 
Web environment, servers and clients effect data trans- 
action using the Hypertext Transfer Protocol (HTTP), a 
known protocol for handling transfer of various data files 
(e.g., text, still graphic images, audio, motion video, 
etc.). Information is formatted for presentation to a user 
by a standard page description language, the Hypertext 
Markup Language (HTML). In additional to basic pres- 
entation formatting, HTML allows developers to specify 
"links" to other Web resources, identified by a Uniform 
Resource Locator (URL). An URL is a special syntax 
identifier defining a communications path to specific in- 
formation. Each logical block of information accessible 
to a client, called a "page", is identified by an URL. 

Retrieval of information on the Web is generally ac- 
complished with an HTML-compatible "browser", a pro- 
gram capable of submitting a request for information 
identified by an URL, at the client machine. The request 
is submitted to a server connected to the client and may 
be handled by a series of servers to effect retrieval of 
the requested information. The information is provided 
to the client formatted according to HTML. 

The largest segment of the consuming public does 
not currently have access to these Web resources. Such 
consumers are typically either unable or unmotivated to 
acquire both the requisite hardware and software and 
the necessary computer skills for taking advantage of 
these resources. While most computers currently being 
sold come preloaded with Internet access facilities, in- 
cluding Web browsers, a substantial number of house- 
holds do not have personal computers. There is a need 
for low cost data processing systems which are simple 
to operate, allowing users without computer skills the 
opportunity to access the Internet. This need is being 



addressed, to some extent, by "set-top" systems, such 
as for example "WebTV." These systems allow a televi- 
sion to be rapidly switched between providing conven- 
tional television viewing, either broadcast or cable, and 
5 providing a user interface for Internet access. The user's 
television thus becomes part of a Web appliance. 

In designing a low cost, simple data processing sys- 
tem for a Web appliance, however it is necessary to pre- 
sume that the target user is unsophisticated and/or in- 
fo experienced. Therefore, the operation of the data 
processing system must be both simple and intuitive, 
requiring little or no technical sophistication on the part 
of the user. In this regard, many of the features of con- 
ventional Web browsers must be adapted to be trans- 
75 parent to the user when implemented in a Web appli- 
ance. 

One feature of Web browsers which would be par- 
ticularly advantageous to implement in connection with 
Web appliances is off-line browsing. Large traffic de- 

20 mands to specific Web sites can make access to such 
sites difficult. Off-line browsing allows information at the 
site to be retrieved during off-peak periods without con- 
temporaneous user interaction at the client for subse- 
quent off-line viewing by the user. Off-line browsing is a 

25 process of viewing Web pages cached in a local mem- 
ory, such as a hard drive, without connection to the Web 
site from which those pages originate. The pages are 
typically retrieved from the originating Web site by off- 
peak retrieval, or retrieval during periods when traffic to 

30 the site is at a minimum. 

Typically, a scheduling utility allows a user to re- 
trieve specific Web pages for storage on the user's hard 
drive and later viewing. While an off-line browser may 
provide benefits to an individual user, however, it cannot 

35 support optimization of communications between a 
group of clients and the Web. Individual clients, each 
employing off-peak information retrieval, may still tax 
communications resources when connected to the 
same server or group of servers. Such a situation will 

40 particularly arise where substantial numbers of Web ap- 
pliances access the Internet through a single service 
provider. In addition to practical constraints on off-peak 
information retrieval which complicate off-line browsing 
in such environments, it is anticipated that service pro- 

45 viders will limit the time allotted for off-peak information 
retrieval for off-line browsing. 

It would be desirable, therefore, to provide an auto- 
matic and more efficient feature for downloading infor- 
mation from popular Internet sites to specific groups of 

50 users. Use of off-peak information retrieval by multiple 
users, even if staggered, creates bottlenecks between 
the server and the Internet and requires additional re- 
sources to satisfy the bandwidth requirements. It is fur- 
ther desirable, therefore, to provide a mechanism for 

55 eliminating the bandwidth requirements imposed. It 
would also be advantageous for the mechanism to min- 
imize transfer time both from the source and to individual 
users, and to require minimal resources at the server 
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Accordingly, the present invention provides a meth- 
od for efficient distribution of precached data from a 
server to a plurality of user client systems : comprising: 

receiving a request for data from a client: 
identifying requested data which is not already 
present in said client: 

selecting a portion of said identified requested data 
based on a probability that a user will access said 
selected portion of said identified requested data, 
wherein said selected portion of identified request- 
ed data is subject to a size constraint: and 
transmitting to the client in compressed form said 
selected portion of identified requested data. 

Said step of receiving a request for data from a cli- 
ent typically comprises receiving an off-peak informa- 
tion retrieval request for a Web site, or a browsing re- 
quest for a Web site from a registration list of browsing 
requests for a plurality of Web sites. In the latter case : 
the method preferably further comprises collecting a 
registration list of browsing requests for a plurality of 
Web sites. This list can then be culled as appropriate to 
eliminate abandoned browsing requests. 

In one preferred embodiment: said step of receiving 
a request for data comprises receiving a request for at 
least one page of data from a Web site : and said step 
of selecting a portion of said identified requested data 
for transmission to said system further comprises se- 
lecting pages linked to said at least one page of data for 
transmission to said system. Said step of identifying re- 
quested data which is not already present in said client 
further comprises identifying pages linked to said re- 
quested page from said Web site which are not already 
present in said client: and said step of selecting a portion 
of said identified requested data for transmission to said 
client further comprises selecting pages linked to said 
requested page from said Web site which are likely to 
be accessed by a user. Another possibility is for said 
step of selecting a portion of said identified requested 
"data to further comprise selecting portions, not already 
in said client, of pages linked to said at least one page 
£ which are likely to be accessed by a user: and said step 
' of transmitting said selected portion of said identified re- 
quested data further comprises transmitting said select- 
ed portions of pages to said client. 

In another preferred embodiment, said step of re- 
ceiving a request for data from a client comprises re- 
" ceiving a request for data from a Web site: and said step 
of selecting a portion of said identified requested data 
for transmission to said client further comprises select- 
ing a portion of said data from said Web site which is 
not already present in said client for transmission to said 
client. Typically complete pages from said Web site 
which are not already present in said client and are likely 
to be accessed by a user are likely to be selected, sub- 
ject to a size constraint such as a number of files, quan- 
tity of bytes, or a time limit. 



It is preferred that the method of the invention fur- 
ther comprises the steps of: receiving a second request 
for said data from said client after a period of time: iden- 
tifying remaining data from said identified requested da- 

5 ta which is not already present in said client: selecting 
a portion of said remaining data for transmission to said 
client: compressing said selected portion of said identi- 
fied requested data: and transmitting said selected re- 
maining data to said client in compressed form. 

io The invention further provides apparatus for effi- 
cient distribution of precached data to a plurality of user 
client systems, comprising: 

receiving means for receiving a request for data 

15 from a client: 

identifying means for identifying requested data 
which is not already present in said client; 
selection means for selecting a portion of said iden- 
tified requested data based on a probability that a 

20 user will access said selected portion of said iden- 

tified requested data, wherein said selected portion 
of identified requested data is subject to a size con- 
straint; and 

transmission means for transmitting said selected 
25 portion of said identified requested data to said cli- 
ent in compressed form. 

In a preferred embodiment, the apparatus further 
comprises compression means for compressing said 

30 selected portion of said identified requested data for 
transmission to said client: said receiving means further 
comprises means for receiving a request for data or a 
page of data from a Web site: and said selection means 
further comprises means for selecting a portion of said 

35 data or pages from said Web site which is not already 
present in said client for transmission thereto. 

The aim is that the selected data or pages are likely 
to be accessed by a user, for example pages linked to 
the at least one page of data for transmission to said 

40 client. Thus in one preferred embodiment said identify- 
ing means further comprises means for identifying pag- 
es or portions of pages linked to said requested page 
from said Web site which are not already present in said 
client: and said selection further comprises means for 

45 selecting pages or portions of pages not already in the 
client linked to said requested page from said Web site 
which are likely to be accessed by a user. 

Preferably the apparatus further comprises receiv- 
ing means for receiving a second request for said data 

50 from said system after a period of time: identifying 
means for identifying remaining data from said identified 
requested data which is not already present in said cli- 
ent; selection means for selecting a portion of said re- 
maining data for transmission to said client; and trans- 

55 mission means for transmitting said selected remaining 
data to said client in a compressed form. 

The invention further provides a computer program 
product for use with a data processing system, compris- 
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ing: 

a computer usable medium: 

first instructions on said computer usable medium 
for receiving a request for data from a client system; 
second instructions on said computer usable medi- 
um for identifying requested data which is not al- 
ready present in said client system: 
third instructions on said computer usable medium 
for selecting a portion of said identified requested 
data for transmission to said client system: and 
fourth instructions on said computer usable medium 
for transmitting said selected portion of said identi- 
fied requested data to said client system in com- 
pressed form. 

Typically the computer usable medium is a hard 
disk drive. 

The invention further provides a method of precach- 
ing Web pages : comprising: 

maintaining a registration list of requested Web site 
home pages: 

for each requested Web site home page : creating 
a cached Web site: 

responsive to a request for a Web site home page 
within said registration list, identifying a changed 
portion of a corresponding cached Web site: and 
transmitting a portion of said changed portion of a 
corresponding cached Web site : said transmitted 
portion being subject to a size constraint. 

Preferably said step of creating a cached Web site 
further comprises: identifying pages linked to said Web 
site home page which are likely to be accessed by a 
user: retrieving said requested Web site home page and 
said identified pages linked to said Web site home page; 
and compressing said requested Web site home page 
and said identified pages linked to said Web site home 
page. The pages which are likely to be accessed by a 
user are preferably identified as all pages referenced by 
said Web site home page which are most frequently ac- 
cessed by users: and those pages referenced by a page 
referenced by said Web site home page, wherein a 
breadth first priority is employed to identify pages likely 
to be accessed by a user. 

In the preferred embodiment, said step of transmit- 
ting a portion of said changed portion of a corresponding 
cached Web site comprises transmitting complete pag- 
es unless a page exceeds said size constraint, and said 
step of maintaining a registration list of requested Web 
site home pages further comprises adding new brows- 
ing requests and deleting existing browsing requests 
which have become abandoned. 

The invention further provides a method of efficient- 
ly distributing data to a plurality of users, comprising: 

receiving requests for data from the plurality of us- 



ers: 

based on the received requests, selecting data like- 
ly to be accessed by the plurality of users from a 
larger pool of available data: 
5 precaching the selected data at a server system: 

and 

upon connection of an individual user within the plu- 
rality of users to the server system, transmitting a 
portion of data from the precached data to the indi- 
10 vidual user in compressed form, the portion of pre- 
cached data selected on the basis of a prior re- 
ceived request from the individual user. 

Viewed from another aspect the invention provides 
is a method in a data processing system of efficient distri- 
bution of precached data to a plurality of users, compris- 
ing: 

receiving a request for data from a system: 
20 identifying requested data which is not already 
present in said system: 

selecting a portion of said identified requested data 
for transmission to said system, said selection 
based on a probability that a user will access said 

25 se | ec ted portion of said identified requested data 
and limited to a selected portion of said identified 
requested data within a size constraint: and 
transmitting said selected portion of said identified 
requested data to said system in a compressed 

30 form, wherein precached data may be efficiently 
distributed to a plurality of users. 

Preferably said step of receiving a request for data 
further comprises receiving a request for data or a page 

35 of data from a Web site, and said step of selecting a 
portion of said identified requested data for transmission 
to said system further comprises selecting a portion of 
said data from said Web site, for example pages linked 
to said at least one page of data (not necessarily all at 

40 same Web site), which is not already present in said sys- 
tem and which is likely to be accessed by a user for 
transmission to said system. Typically such selection in- 
volves selecting complete pages from said Web site 
which are likely to be accessed by a user, unless-said 

45 complete pages violate a size constraint, which can be 
specified in terms of a number of files, a quantity of 
bytes, a time limit, or any other suitable criteria. Said 
step of transmitting said selected portion of said identi- 
fied requested data further comprises transmitting said 

so selected portion or pages to said system. 

Thus the data processing system described herein 
has an improved mechanism for the data distribution of 
information from the Internet or other network to a large 
number of data processing systems. This is accom- 

55 plished in a preferred embodiment by a server providing 
access for multiple users to the World Wide Web, in 
which selected pages from periodically updated Web 
sites are precached. Pages linked to the home page for 
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a Web site which are likely to be accessed by a user are 
retrieved and stored on the server. In response to offline 
browsing requests by subscribers to the Web site : the 
pages or portions of pages which are not already 
present in a susbcriber's system are prioritized by like- 
lihood of being accessed utilizing statistical information, 
link relationships, and/or content. The pages or page 
portions most likely to be accessed are compressed and 
transmitted to the subscriber thus minimizing the con- 
nection time required and maximizing the number of 
subscribers which may be updated. 

A preferred embodiment of the invention will now 
be described in detail by way of example only with ref- 
erence to the following drawings: 

Figure 1 depicts a distributed data processing sys- 
tem: 

Figure 2 is a block diagram of a data processing 
system which may be implemented as a server in 
the distributed data processing system of Figure 1 : 
Figure 3 provides a pictorial representation of a da- 
ta processing system which may be implemented 
as a user unit in the distributed data processing sys- 
tem of Figure 1; 

Figure 4 is a block diagram of the major compo- 
nents of data processing unit which may be imple- 
mented as the user unit of Figure 3: 
Figure 5 depicts a high level flowchart for a process 
for precaching data at a server: 
Figure 6 is a high level flowchart for a process for 
transmitting precached downloads to a user unit: 
Figure 7 depicts a high level flowchart for a process 
for handling precached downloads received from a 
server at a user unit: and 

Figure 8 is a high level flowchart for a process for 
retrieving data from a Web site or server cache. 

With reference now to the figures, and in particular 
with reference to Figure 1, a pictorial representation of 
a distributed data processing system is depicted. User 
units 102, 104, 106 and 108 have communications links, 
110, 112, 114 and 116 which provide these user units 
access to the public switched telephone network 
(PSTN) 118. Through these communications links, the 
user units communicate with server 120, which is con- 
nected to PSTN 1 1 8 by communications link 1 22. Server 
1 20 provides users units 1 02-1 08 access to I nternet 1 24 
via communications link 1 26. In addition to providing us- 
ers units 102-108 access to Internet 124, server 120 al- 
so stores various configuration information, passwords, 
E-mail messages, and backup data on storage device 
(SD) 128. User units 102-108 may be located in remote 
geographical locations, such as California or New York. 
Additionally, user units 102-108 may be located on other 
continents on the globe. 

Referring to Figure 2, a block diagram of a data 
processing system which may be implemented as a 
server, such as server 1 20 in Figure 1 , is depicted. Data 



processing system 200 may be a symmetric multiproc- 
essor (SMP) system including a plurality of processors 
202 and 204 connected to a system bus 206. Also con- 
nected to system bus 206 is memory controller/cache 
5 208. which provides an interface to local memory 209. 
I/O bus bridge 210 is connected to system bus 206 and 
provides an interface to I/O bus 212. Memory controller/ 
cache 208 and I/O bus bridge 210 may be integrated as 
depicted. 

10 Peripheral component interconnect (PCI) bus 
bridge 214 connected to I/O bus 212 provides an inter- 
face to PC I bus 21 6. A number of modems 2 1 8-220 may 
be connected to PCI bus 216. Typical PCI bus imple- 
mentations will support four PCI expansion slots or add- 

15 jn connectors. Communications links to PSTN 118 de- 
picted in Figure 1 may be provided through modems 
21 8-220 connected to PCI local bus 216 through add-in 
boards. Modems 218-220 in the depicted example also 
provide a connection to Internet 124 shown in Figure 1. 

20 Additional PCI bus bridges 222, 224 provide inter- 
faces for additional PCI buses 226, 228, from which ad- 
ditional modems may be supported. In this manner serv- 
er 200 allows dial-ups by multiple user units simultane- 
ously. A memory mapped graphics adapter 230 and a 

25 hard disk 232 may also be connected to I/O bus 21 2 as 
depicted, either directly or indirectly. 

Those of ordinary skill in the art will appreciate that 
there are many possible variations in the hardware de- 
picted in Figure 2, which is not meant to imply architec- 

30 tural limitations with respect to the present invention. For 
example, other peripheral devices, such as optical disk 
drive and the like also may be used in addition or in place 
of the hardware depicted. 

The data processing system depicted in Figure 2 

35 may be, for example, an I BM RS/6000 system, a product 
of International Business Machines Corporation in Ar- 
monk, New York, running the AlXoperating system. The 
data processing system provides a platform for a Web 
server, and may be one of a group of interconnected 

40 servers employed by an Internet service provider to pro- 
vide access to Web clients or user units to the Internet. 

The server data processing system in accordance 
with a preferred embodiment includes a Web server pro- 
gram, such as Netscape Enterprise Server Version 2.0, 

45 a product of Netscape Communications Corporation in 
Mountain View, California, which supports interface ex- 
tensions. The server thus contains a known set of server 
application functions (SAFs) which accept a client's re- 
quest together with configuration information and return 

50 a response. The server may also include an Application 
Programming Interface (API) providing extensions ena- 
bling application developers to extend or customize the 
SAFs through software programs commonly known as 
"plug-ins." The server supports off-line browsing by cli- 
55 ents and provides storage for precaching Web pages. 
The server also implements and/or supports the proc- 
esses described below for selecting Web pages for pre- 
caching and off-line downloading by clients. 
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With reference now to Figure 3, a pictorial repre- 
sentation of a data processing system which may be im- 
plemented as a user unit, such as user units 102-108 in 
Figure 1, is depicted. Figure 3 is a pictorial represen- 
tation of the data processing system as a whole. Data 
processing system 300 in the depicted example pro- 
vides : with minimal economic costs for hardware to the 
user access to the Internet. Data processing system 
300 includes a data processing unit 302. Data process- 
ing unit 302 is preferably sized to fit in typical entertain- 
ment centers and provides all required functionality 
conventionally found in personal computers, to enable 
a user to "browse" the Internet. Additionally, data 
processing unit 302 may provide other common func- 
tions such as, for example, serving as an answering ma- 
chine, transmitting or receiving facsimile transmissions, 
or providing voice mail facilities. 

Data processing unit 302, a Web appliance, is con- 
nected to television 304 for display of graphical informa- 
tion. Television 304 may be any suitable television, al- 
though color televisions with an S-Video input will pro- 
vide better presentations of the graphical information. 
Data processing unit 302 may be connected to televi- 
sion 304 through a standard coaxial cable connection. 
A remote control unit 306 allows a user to interact with 
and control data processing unit 302. Remote control 
unit 306 emits infrared (IR) signals, preferably modulat- 
ed at a different frequency than the normal television, 
stereo, and VCR infrared remote control frequencies in 
order to avoid interference. Remote control unit 306 pro- 
vides the functionality of a pointing device in conven- 
tional personal computers, including the ability to move 
a cursor on a display and select items. 

Referring now to Figure 4, a block diagram for the 
major components of data processing unit 302 in ac- 
cordance with a preferred embodiment is portrayed. As 
with conventional personal computers, data processing 
unit 302 includes a motherboard 402 containing a proc- 
essor 404 and memory 406 connected to system bus 
408. Processor 405 is preferably at least a 486 proces- 
sor operating at or above 100 MHz. Memory 406 in- 
cludes read only memory (ROM) 406a containing a ba- 
sic input/output services (BIOS) routine and may include 
cache memory and/or video RAM. 

Video/TV converter 410 on motherboard 402 and 
connected to system bus 408 generates computer video 
signals for computer monitors, a composite television 
signal, and an S-Video signal. The functionality of video/ 
TV converter 410 may be provided utilizing commercial- 
ly available video and converter chips. Keyboard/re- 
mote control interface unit 412 on motherboard 402 re- 
ceives keyboard codes through controller 414, regard- 
less of whether a wired keyboard/pointing device or an 
infrared keyboard/remote control is being employed. In- 
frared remote control unit 306 transmits signals which 
are ultimately sent to the serial port as control signals 
generated by conventional mouse or pointing device 
movements. Two buttons on remote control unit 306 are 



interpreted identically to the two buttons on a conven- 
tional mouse, while the remainder of the buttons trans- 
mit signals corresponding to keystrokes on an infrared 
keyboard. Thus, remote control unit 306 has a subset 

5 of the functions provided by an infrared keyboard. Con- 
nectors indicators 416 on motherboard 402 provide the 
connections and indicators on data processing unit 302 
described above. 

External to motherboard 402 in the depicted exam- 

10 pie are power supply 418, hard drive 420, modem 422, 
and speaker 424. Power supply 418 is a conventional 
power supply except that it receives a control signal from 
controller 414 which effects shut down of all power to 
motherboard 402, hard drive 420, and modem 422. In 

15 some recovery situations, removing power and reboot- 
ing is the only guaranteed method of resetting all of 
these devices to a known state. Thus, power supply 41 8, 
in response to a signal from controller 414. is capable 
of powering down and restarting data processing unit 

20 302 

Hard drive 420 contains operating system and ap- 
plications software for data processing unit 302, which 
preferably includes: IBM DOS 7.0, a product of Interna- 
tional Business Machines Corporation in Armonk. New 

25 York: Windows 3.1, a product Microsoft Corporation in 
Redmond, Washington: and Netscape Navigator, a 
product of Netscape Communications Corporation in 
Mountain View, California. Data may also be stored on 
hard drive 420. Modem 422, inserted into a slot mounted 

30 sideways on motherboard 402, is preferably a 33.6 kbps 
modem supporting the V.42bis, V34bis, V.34, V.17 Fax, 
MNP 1-5, and AT command sets. Hard drive 420 may 
also store data, such as a list of favorite internet sites or 
unviewed downloads from an internet site. Additionally, 

35 hard drive 420 contains instructions necessary to estab- 
lish a communications link with a service provider and 
initiate a configuration process for the data processing 
system. 

Controller 41 4 is preferably one or more of the805x 

40 family controllers. Controller 414 is continuously pow- 
ered and, when data processing unit 302 is turned on, 
monitors the system for a periodic "ping" indicating that 
data processing unit 302 is operating normally. In the 
event that controller 414 does not receive a ping within 

45 a prescribed timeout period, controller 414 removes 
power from the system and restarts the system. This 
may be necessary, for example, when the system expe- 
riences a general protection fault. If multiple attempts to 
restart the system prove unsuccessful, controller 414 

50 shuts off data processing unit 302 and signals that serv- 
ice is required through indicators 416. Thus, data 
processing unit 302 is capable of self-recovery in some 
circumstances without involvement by a user. 

Controller 414 also receives and processes input 

55 from infrared remote control 306, infrared keyboard, 
wired keyboard, or wired mouse. When one keyboard 
or pointing device is used, all others are locked out (ig- 
nored) until none have been active for a prescribed pe- 
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riod. Then the first keyboard or pointing device to gen- 
erate activity locks out all others. Controller 414 also di- 
rectly controls all LED indicators except that indicating 
modem use and specifies the boot sector selection dur- 
ing any power off-on cycle. 

Those skilled in the art will recognize that there are 
many possible variations in the components depicted in 
Figure 3 and 4 for specific applications or embodiments 
which remain within the scope of the present invention. 

With reference now'to Figure 5, a high level flow- 
chart for a process for precaching data at a server in 
accordance with a preferred embodiment is depicted. 
The process may be executed by a Web server during 
off-peak hours or may be continuously run throughout a 
day or a span of several days as a background applica- 
tion. 

The process depicted begins at step 502, which de- 
picts a timed "wakeup" or automatic initiation of the proc- 
ess based on the server system clock or some other 
event within the server. The timing of the download ini- 
tiation may be coordinated with the scheduling of auto- 
matic downloads by user units. For example, if the user 
units utilizing the server are all configured to make au- 
tomatic downloads between 3:00 a.m. and 5:00 a.m., 
the server may be configured to start the process at ap- 
proximately 1 :00 a.m. so that the necessary downloads 
to the server are complete in time for the user unit down- 
loads. Alternatively, the respective automatic down- 
loads may be overlapped, with individual downloads to 
the server being completed prior to requests from the 
user units. This alternative may be appropriate if the 
process is run as a background application. 

The process then passes to step 504, which illus- 
trates selecting an item from a registration list for data 
to be precached at the server. The registration list con- 
tains identifications of information which clients or user 
units download on a periodic basis (e.g., every night or 
once a week) together with an associated number of us- 
ers currently registered for each identified information. 
The registration list may contain, for example, a list of 
URLs for various Web pages frequently requested by a 
user or client. 

The registration list may be generated by specific 
user requests that entries be added to the registration 
list, by monitoring user transfers for periodic transfer 
from the same source, or both. For example, the regis- 
tration list may be generated by compiling specific off- 
peak information retrieval requests from clients employ- 
ing the server at which Web data is to be precached, or 
by examining the "bookmarks" or "favorites" lists for a 
client's Web browser Alternatively, a client's Web 
browser may be configured so that adding a URL to a 
bookmark list initiates a query to the client regarding 
adding the URL to an off-peak retrieval list. 

The registration list may be maintained by monitor- 
ing actual user transfers and comparing them to regis- 
tration requests, decrementing the request number as- 
sociated with an item when a user which requested that 



item does not download the item for longer than a 
threshold period (i.e., a month). In this manner, request- 
ed items which have been "abandoned" or unused by a 
requesting user may be culled from the registration list. 

5 For example, a user unit may monitor whether down- 
toads are viewed and, after a period of time during which 
a specific download is not viewed, terminate automatic 
download of such data. 

From step 504, the process passes to step 506, 

10 which depicts a determination of whether the number of 
clients or user units registered for periodic downloads 
of the item selected exceeds a threshold. The threshold 
is determined by the efficiency of precaching the down- 
loads for the selected item. For example, if a single user 

is only is registered for a specific download, it may be more 
efficient to simply allow that user to download the re- 
quested data directly, rather than precaching the down- 
load at the server. The threshold for individual items on 
the registration list may be dependently set on whether 

20 the item was specifically requested for off-peak informa- 
tion retrieval by a user or was merely added to the reg- 
istration list based on a frequency of requests for this 
item. 

If less than the threshold number of users are reg- 

25 istered for the selected item, the process proceeds to 
step 507, which depicts a determination of whether all 
entries have been checked. If so, the process proceeds 
to step 514, described below. If not, however, the proc- 
ess returns to step 504 for selection of a different item. 

30 if at least the threshold number of users are registered 
for the selected item, the process proceeds to step 508, 
which illustrates fetching the data identified by the se- 
lected item, compressing the data, and storing it at the 
server. Any suitable compression utility may be utilized 

35 for compressing the data. 

In a preferred embodiment, the data fetched com- 
prises Web pages from a Web site on the Internet. The 
number of Web pages retrieved is determined by rules 
or heuristics for deciding which Web pages and subpag- 

•*o es are most likely to be accessed by a user. The Web 
pages downloaded each consist of a number of files or 
components. Therefore, a prior download of Web page 
components stored on the server but which were updat- 
ed or are no longer referenced in the Web pages re- 

45 trieved may be deleted as part of this step. Web pages 
obtained from different sites may be compressed and 
stored separately to facilitate distribution to each user 
of only the pages requested by that user. Thus each 
server acts as a mirror site only for the specific Web sites 

so requested for off-peak retrieval by users utilizing that 
server. 

The process then passes to step 51 0, which depicts 
updating a download list containing a list of items pre- 
cached at the server for client downloads. The download 
55 list may contain other information, such as the time and 
date of each downloaded component. The process next 
passes to step 512. which illustrates a determination of 
whether all entries or items in the registration list have 
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been checked. If not the process returns to step 504 for 
selection of another item in the registration list. If so. 
however, the process proceeds to step 514. which de- 
picts the process becoming idle until the next timed pre- 
cache download is initiated. 

Referring to Figure 6, a high level flowchart for a 
process for transmitting precached downloads to a user 
unit (or Web client) in accordance with a preferred em- 
bodiment is portrayed. The process begins at step 602, 
which depicts a server receiving a download request, 
such as an automatic, timed off-peak retrieval request 
for a particular Web page or an on-line request made at 
dialup. The request may contain an URL for a particular 
Web site, as described below. The process passes next 
to step 604, which illustrates checking the download list 
of precached downloads at the server and then to step 
606, which depicts a determination of whether the re- 
quested download is precached at the server If not, the 
process proceeds to step 608, which illustrates trans- 
mitting the download requested to the appropriate loca- 
tion. The process then passes to step 61 6, which depicts 
the process becoming idle until the next download re- 
quest is received. 

If the requested download is precached at the serv- 
er, the process passes instead to step 610, which illus- 
trates interception of the download request by the serv- 
er. The process next passes to step 612, which depicts 
transmission of the requested download data from the 
server to the requesting user unit or Web client. The re- 
quested data is stored in a local memory, such as a hard 
disk drive, in the user unit. The process then passes to 
step 614, which illustrates the user unit automatically 
decompressing the downloaded data. The process then 
passes to step 61 6, which depicts the process becoming 
idle until the next download request is received. 

With reference now to Figure 7 a high level flow- 
chart for a process for handling precached downloads 
received from a server at a user unit in accordance with 
a preferred embodiment is depicted. The process be- 
gins at step 702, which illustrates the user unit receiving 
a precached download from the server. The process 
then passes to step 704, which illustrates deleting the 
previous download of a similar nature (i.e. older versions 
of the downloaded Web page components) received 
from the server. The process passes next to step 706, 
which depicts automatically decompressing the down- 
load at the time it is received to speed viewing. The proc- 
ess then passes to step 708, which illustrates the proc- 
ess becoming idle until another download is received. 

Referring to Figure 8 (described in more detail be- 
low), a high level flowchart for a process for retrieving 
data from a Web site or server cache in a preferred em- 
bodiment is portrayed. The process illustrated is em- 
ployed, as a whole or in part, to retrieve information from 
a Web site for precaching at a Web server as depicted 
in step 508 of Figure 5 or from a Web server for storage 
on a Web client's hard drive as described in connection 
with step 612 in Figure 6. 



The process for retrieval of information for precach- 
ing at a server requires more than a simple fetch, and 
must be adapted to the realities of off-peak information 
retrieval. It is anticipated that user requests for off-peak 

5 retrieval will far exceed the capacity of the bandwidth, 
time, and resources available to satisfy such requests. 
That is : the requests, if satisfied, would consume more 
resources than could practically be made available. 
Thus, the process of retrieving data for precaching must 

10 analyze the requests and the data requested and fetch 
data in an intelligent manner. 

Web pages, or hypertext documents, are retrieved 
through a URL identifying the communications source 
for the page. The URL is typically of the form "www.do- 

15 mainname.ext/filepath/filename". The domain name 
and extension identify a specific Web site (Web domain 
or server) containing the requested information. The re- 
quested information will comprise a file ora group of files 
organized within directories on the Web site which is the 

20 subject of the request. Thus, the URL must include a 
path to the files containing the information requested 
and may also require a filename. No extension need be 
specified for the filename since only HTML files are dis- 
played by the browser and a default extension of ".htm 1 " 

25 or " htm" is assumed. If no filename is specified in the 
request, the browser searches for an HTML file named 
"default" or "index" at the specified Web domain and 
path. 

Web pages at a particular location (or domain) com- 

30 prise an HTML file or plurality of HTML files together with 
associated graphics, sound, motion video, and execut- 
able script files. An HTML file forming part of a Web page 
will frequently include references to graphics files such 
as images in JPEG or GIF format, sound files such as 

35 audio information in WAV or MIDI format, motion video 
files such video information in MPEG format, and/or ex- 
ecutable script files such as JAVA, JAVASCRIPT or 
Common Gateway Interface (CGI) script files. More im- 
portantly, an HTML file will typically contain "links", or 

40 embedded references including URLs for "jumping" to 
(or in reality, retrieving) other HTML files. These other 
HTML files may be local (located at the same Web do- 
main, although perhaps at a different path) or remote 
(located on a different Web domain or server). 

45 Display of a Web Page by a browser requires re- 
trieval of at least one HTML file formatting the Web page 
to be displayed and each graphics, sound, motion video, 
and script file referenced in the HTML file(s). Addition- 
ally, effectively caching a Web page requires that links 

50 within the page be resolved and data retrieved for dis- 
play. The pages referenced by the links may themselves 
contain links to still other pages, and so on. In this man- 
ner a single offline browsing request could conceivably 
request a page containing the root of a link "tree" which, 

55 jf fully expanded, would include virtually every Web 
page currently published. Moreover, a client's off-peak 
retrieval time may be limited for the reasons described 
above. For these reasons, the process for retrieving da- 
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ta to or from a server cache must be intelligently imple- 
mented. 

The process depicted begins at step 802, which de- 
picts receiving a fetch request. The fetch request may 
be received by the server in an off-peak retrieval request 
by a client, or as part of a retrieval process for precach- 
ing by the server. The process then passes to step 804, 
which illustrates determining the information to be re- 
trieved or transmitted pursuant to the request. The in- 
formation retrieved from a Web site for caching at a serv- 
er or transmitted from a server cache to a client is se- 
lected using a set of rules or heuristic to identify files 
most likely to satisfy a user's interest without inordinate- 
ly taxing available resources. 

In the server caching context, the rules for deter- 
mining which Web site files to retrieve for caching at the 
server are driven by the twin goals of obtaining a span 
of pages likely to interest a client and restricting the 
cache data to an appropriate size given the total cache 
size available. A broad sampling of the files associated 
with a Web page should be retrieved without devoting 
an unreasonable amount of system resources to follow- 
ing a specific series, or path, of links. 

For example, if a particular news page site was of 
interest to users, the system would initially retrieve the 
news sites initial or default HTML file and graphics, 
scripts, etc. referenced within that file. The set of files 
comprising the default HTML file and closely associated 
graphics, scripts, etc. is sometimes also called a "home 
page". In general, any HTML file together with the 
graphics, sound, motion video, and/or script files refer- 
enced within the text of the HTML file may be referred 
to as a "page". The same graphics, sound, motion video, 
and/or script files may be associated with or referenced 
by more than one HTML file and therefore may be found 
in more than one page. In contrast to graphics, sound, 
motion video, and/or script files, references to separate 
HTML files, or links within a page, are references to dis- 
tinct pages. In general, when retrieving Web site infor- 
mation for caching, complete pages are preferably re- 
trieved rather than partial pages (e.g., ignoring sound 
files), unless size constraints would be violated (e.g., the 
page includes unusually large motion video files). 

Next, the system would begin resolving links within 
the news site's home page, following these links to other 
pages and retrieving those pages. Links within these 
second level pages are followed, and the process con- 
tinues recursively until the link tree originating with the 
home page is fully exhausted or a threshold is exceed- 
ed. The threshold may be determined by a number of 
files retrieved, a quantity of bytes retrieved, or a time 
spent in retrieving files. Since it will most commonly be 
the case that the selected threshold will be exceeded 
before the link tree is exhausted, a mechanism must be 
provided for identifying the most preferred pages within 
the link tree to retrieve. 

Many pages include facilities for monitoring the 
number of users which access that specific page. This 



statistical information may therefore be employed to 
identify the most popular areas of a site for preferential 
retrieval. A "breadth first" retrieval system may be em- 
ployed, either in conjunction with or in lieu of employing 

5 statistical information to identify most popular pages. A 
"breadth first" system would retrieve all pages, either as 
a whole or in similar amounts, from a given level of a 
link tree before proceeding to a subsequent level of the 
link tree. This is in contrast to a "depth first" retrieval 

10 system, which would fully exhaust all levels fora specific 
path within the link tree before addressing branches 
from that path at various levels. One "breadth first" 
method of caching is described in copending, commonly 
assigned application entitled "Method for Optimizing 

15 Off-Peak Caching of Data" by J. Thompson and V. Ber- 
stis, Serial No. 08/797,902, filed 10 Feb 97 (EP applica- 
tion 98300847.5), which is hereby incorporated by ref- 
erence. 

Still another rule for page retrieval, which may be 

20 implemented in conjunction with the systems described 
above, may be to prefer pages at the same site (i.e. iden- 
tified by the same domain name within the page URL) 
to pages located at different sites. The rule may be ex- 
tended to prefer pages in the same directory at a given 

25 site to pages in different directories. A site-based pref- 
erence allows filtration of so-called "superlink" pages 
from the retrieval. Superlink pages contain links to a 
plethora of sites in the Web, often serving a resource 
locator for a particular area of interest. Following all links 

30 within a superlink page could quickly consume available 
system resources. A page link count may also be em- 
ployed in conjunction with a site-based preference, ig- 
noring links with a page containing more than a thresh- 
old number of, for example, 100. Again, such a large 

35 number of links would quickly exhaust available re- 
sources if fully resolved. 

In the client retrieval context, the rules for determin- 
ing what information is to be transmitted to the client dur- 
ing off-peak information retrieval are based on similar 

40 goals of providing pages of interest to the client and re- 
maining within a threshold of allocated resources. An 
overriding concern with minimizing connection time by 
the off-peak retrieval is also present in the context of 
downloading cached information to the users. This al- 

45 lows larger numbers of clients to be fairly provided qual- 
ity service. In this regard, limitations on time and band- 
width for off-peak retrieval may be more strict. 

Due to such contextual differences, the rules for de- 
termining what information is to be transmitted to the cli- 

50 ent are somewhat different. For example, if the client 
has never before downloaded a specific cached site, the 
entire content of that site (every page) must be down- 
loaded. If the user is limited to one hour of off-peak in- 
formation retrieval per night, it may not be possible to 

55 download all pages to the client in a single night. Several 
nights may be required to download a specific site. 
Therefore, a priority system must be established for 
downloaded content. 
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It must be remembered that the Web pages for a 
requested site are being effectively mirrored at the user 
unit for off-line browsing. A distinction must be made be- 
tween users downloading a new site for the first time 
and existing subscribers merely requiring an update. 
For a large Web site, it may not be possible to download 
to a first time subscriber all files required for the request- 
ed site in a single night. Instead, it may be necessary to 
download the Web site files to the new subscriber over 
a period of several nights. In this context, it may be pref- 
erable not to download complete pages so that the user 
may still be able to view some content of the requested 
Web site while awaiting the complete off-peak retrieval 
to be completed for off-line browsing. For example, the 
sound files associated with a page may be considered 
lower priority than the HTML and associated graphics 
files. Thus, all sound files for a given site may be as- 
signed the lowest download priority, downloaded on the 
last night. Additionally, statisticaiiy based, breadth -first, 
and site- or directory-based preferences as described 
above may be employed for determining which pages 
are selected for downloading. 

In the client download context, the goal of minimiz- 
ing connection time also elevates considerations of in- 
formation overlap. For example, if the same graphics 
files is used in a many pages for the specific site, as 
might occur with a logo image, that graphics file may be 
assigned a higher download priority than other files for 
the same site. Another scenario invoking this consider- 
ation is whether the client has already downloaded in- 
formation for a given site, and merely requires an up- 
date, as might occur with a daily news site employing 
the same advertisements. Comparison of file dates and 
sizes in the server cache with those present on a client's 
system will reveal the changes which must be updated 
for the client. Alternatively, a listing of the files previously 
downloaded to the client may be maintained. This may 
be preferable since a listing of the services to which an 
individual client subscribes must be maintained in any 
event. Such listings also provide a resource for updating 
the registration list of sites to cache and for culling the 
files previously downloaded to the client. 

An expanded view of the process for step 804 for 
the client download context is depicted in Figure 8. In 
the client download context, the determination of what 
data to transmit begins with step 804a, which illustrates 
identifying the data already downloaded to the client, if 
any, for the requested Web site. The example depicted 
assumes that a list of files downloaded to the client for 
a given Web site is maintained, eitheron the client's ma- 
chine or the server. This may be compared to a list of 
current files for the Web site, with changes identified by 
discrepancies in file name, date, or size. The process 
then passes to step 804b, which illustrates prioritizing 
data to be downloaded according to the rules described 
above or similar rules. 

The process depicted may be employed as part of 
a multicasting process, where the same stream of infor- 



mation is provided to different recipients simultaneously. 
It may be, for example, that a number of users subscribe 
to off-peak retrieval for the same news site and therefore 
require the same update. If the individual download 

5 processes may be coordinated to receive the same 
stream of information, a single server process may be 
employed to update each respective user. Multicasting 
may be particularly useful where there are multiple user 
units in a single household. 

10 Once the data to be transmitted or retrieved is iden- 
tified in step 804, the process passes to step 806, which 
depicts retrieving the identified information and com- 
pressing it for storage at the server or extracting the 
identified information from a compressed cache file, 

75 compressing it, and transmitting it to the client. In the 
client download context, the concern with minimizing 
connection time also elevates the importance of com- 
pression in transmitting cached data to the client. Com- 
pressing a body of Web site information in a single file 

20 at the server is not a serious impediment to selective 
transmission of pages or files from that Web site infor- 
mation to the client. Known algorithms allow files to be 
extracted from compressed archives and compressed 
on the fly during transmission, so that only selected pag- 

25 es or files from the information need be transmitted, and 
may be transmitted in compressed form to reduce con- 
nection time. 

It is anticipated that Web sites will eventually imple- 
ment cache-optimized pages. For example, a Web site 

oo may configure pages with knowledge of the rules used 
to prioritize caching and downloads, creating cache-op- 
timized pages for preferential caching and downloading 
to clients. Web publishers may alternatively include 
comments identifying which pages and/or files are part 

35 of the same Web "publication" and which links reference 
pages for distinct publications. Web sites which update 
pages on a periodic basis may compress the updated 
or changed files for the Web site in a single bundle for 
efficient retrieval. Where a server provides off-line sub- 

io scriptions to such a Web site, the server need only act 
as a mirror for the compressed, changed files. 

Thus the approach described herein provides an ef- 
ficient means of distributing data from a plurality of 
sources to a plurality of destinations in situations where 
the all of the data necessarily passes through the same 
node during the separate transfers. The bandwidth re- 
quirements from a server to the Internet can be de- 
creased by precaching or "mirroring" information re- 
quested by multiple users at the server. The requested 

50 information is compressed to occupy less space at the 
server and to speed transfer to the user. 

Although the present invention has been primarily 
described in the context of a fully functional data 
processing system, those skilled in the art will appreci- 

55 ate that appropriate software in the form of a computer 
readable medium with instructions, including recordable 
type media such as floppy disks and CD-ROMs and 
transmission type media such as digital and analog 
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communication links, may be used to implement the 
system and method of the invention. 



Claims 

1 . A method for efficient distribution of precached data 
from a server to a plurality of user client systems, 
comprising: 

receiving a request for data from a client; 
identifying requested data which is not already 
present in said client: 

selecting a portion of said identified requested 
data based on a probability that a user will ac- 
cess said selected portion of said identified re- 
quested data : wherein said selected portion of 
identified requested data is subject to a size 
constraint; and 

transmitting to the client in compressed form 
said selected portion of identified requested da- 
ta. 

2. The method of claim 1 , wherein said step of receiv- 
ing a request for data from a client comprises re- 
ceiving an off-peak information retrieval request for 
a Web site. 

3. The method of claim 1 : wherein said step of receiv- 
ing a request for data from a client comprises re- 
ceiving a browsing request for a Web site from a 
registration list of browsing requests for a plurality 
of Web sites. 

4. The method of claim 3 : further comprising collecting 
a registration list of browsing requests for a plurality 
of Web sites. 

5. The method of claim 4, wherein said step of collect- 
ing a registration list of browsing requests further 
comprises culling said registration list to eliminate 
abandoned browsing requests. 

6. The method of claim i , wherein said step of receiv- 
ing a request for data comprises receiving a request 
for at least one page of data from a Web site. 

7. The method of claim.6, wherein said step of select- 
ing a portion of said identified requested data for 
transmission to said system further comprises se- 
lecting pages linked to said at least one page of data 
for transmission to said system. 

8. The method of claim 6 or 7, wherein said step of 
identifying requested data which is not already 
present in said client further comprises identifying 
pages linked to said requested page from said Web 
site which are not already present in said client; and 



said step of selecting a portion of said identified re- 
quested data for transmission to said client further 
comprises selecting pages linked to said requested 
page from said Web site which are likely to be ac- 
5 cessed by a user. 

9. The method of claim 6 or 7, wherein said step of 
identifying requested data further comprises identi- 
fying portions of pages linked to said at least one 

w page which are not already in said client; said step 
of selecting a portion of said identified requested 
data further comprises selecting portions., not al- 
ready in said client, of pages linked to said at least 
one page which are likely to be accessed by a user: 

is and said step of transmitting said selected portion 
of said identified requested data further comprises 
transmitting said selected portions of pages to said 
client. 

20 1 o. The method of claim 1 , wherein said step of receiv- 
ing a request for data from a client comprises re- 
ceiving a request for data from a Web site: and said 
step of selecting a portion of said identified request- 
ed data for transmission to said client further com- 

25 prises selecting a portion of said data from said Web 

site which is not already present in said client for 
transmission to said client. 

11. The method of claim 10. wherein said step of se- 
30 lecting a portion of said data from said Web site fur- 
ther comprises selecting complete pages from said 
Web site which are not already present in said client 
and are likely to be accessed by a user up to a size 
constraint selected from the group consisting of 

35 number of files, quantity of bytes, and a time limit. 

12. The method of any preceding claim, further com- 
prising: 

40 receiving a second request for said data from 

said client after a period of time: 
identifying remaining data from said identified 
requested data which is not already present in 
said client: 

45 selecting a portion of said remaining data for 

transmission to said client; and 
transmitting said selected remaining data to 
said client in compressed form. 

50 13. The method of any preceding claim, further com- 
prising compressing said selected portion of said 
identified requested data for transmission to said 
client. 

55 14. An apparatus for efficient distribution of precached 
data to a plurality of user client systems, compris- 
ing: 
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receiving means for receiving a request for data 
from a client: 

identifying means for identifying requested data 
which is not already present in said client: 
selection means for selecting a portion of said s 
identified requested data based on a probability 
that a user will access said selected portion of 
said identified requested data : wherein said se- 
lected portion of identified requested data is 
subject to a size constraint: and 1C 
transmission means for transmitting said se- 
lected portion of said identified requested data 
to said client in compressed form. 



15. A computer program product for use with a data is 
processing system, comprising: 

a computer usable medium: 
first instructions on said computer usable me- 
dium for receiving a request for data from a cli- 20 
ent system: 

second instructions on said computer usable 
medium for identifying requested data which is 
not already present in said client system: 
third instructions on said computer usable me- 2s 
dium for selecting a portion of said identified re- 
quested data for transmission to said client sys- 
tem: and 

fourth instructions on said computer usable me- 
dium for transmitting said selected portion of 30 
said identified requested data to said client sys- 
tem in compressed form. 

16. A method of precaching Web pages, comprising: 

35 

maintaining a registration list of requested Web 
site home pages: 

for each requested Web site home page, cre- 
ating a cached Web site: 

responsive to a request for a Web site home -to 
page within said registration list, identifying a 
changed portion of a corresponding cached 
Web site: and 

transmitting a portion of said changed portion 

of a ccrresponding cached Web site, said trans- *s 

mitted portion being subject to a size constraint. 

17. The method of claim 16, wherein said step of cre- 
ating a cached Web site further comprises: 

50 

identifying pages linked to said Web site home 
page which are likely to be accessed by a user: 
retrieving said requested Web site home page 
and said identified pages linked to said Web 
site home page; and 55 
compressing said requested Web site home 
page and said identified pages linked to said 
Web site home page. 



18. The method of claim 17, wherein said step of iden- 
tifying pages linked to said Web site home page 
which are likely to be accessed by a user further 
comprises: 

identifying all pages referenced by said Web 
site home page which are most frequently ac- 
cessed by users; and 

identifying pages referenced by a page refer- 
enced by said Web site home page, wherein a 
breadth first priority is employed to identify pag- 
es likely to be accessed by a user. 

1 9. The method of claim 1 6, 1 7 or 1 8 S wherein said step 
of transmitting a portion of said changed portion of 
a corresponding cached Web site comprises trans- 
mitting complete pages unless a page exceeds said 
size constraint. 

20. A method of efficiently distributing data to a plurality 
of users, comprising: 

receiving requests for data from the plurality of 
users: 

based on the received requests, selecting data 
likely to be accessed by the plurality of users 
from a larger pool of available data: 
precaching the selected data at a server sys- 
tem: and 

upon connection of an individual user within the 
plurality of users to the server system, transmit- 
ting a portion of data from the precached data 
to the individual user in compressed form, the 
portion of precached data selected on the basis 
of a prior received request from the individual 
user. 
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